Large Scale Sparse Clustering

نویسندگان

  • Ruqi Zhang
  • Zhiwu Lu
چکیده

Large-scale clustering has found wide applications in many fields and received much attention in recent years. However, most existing large-scale clustering methods can only achieve mediocre performance, because they are sensitive to the unavoidable presence of noise in the large-scale data. To address this challenging problem, we thus propose a large-scale sparse clustering (LSSC) algorithm. In this paper, we choose a two-step optimization strategy for large-scale sparse clustering: 1) k-means clustering over the large-scale data to obtain the initial clustering results; 2) clustering refinement over the initial results by developing a spare coding algorithm. To guarantee the scalability of the second step for large-scale data, we also utilize nonlinear approximation and dimension reduction techniques to speed up the sparse coding algorithm. Experimental results on both synthetic and real-world datasets demonstrate the promising performance of our LSSC algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse Functional Representation for Large-Scale Service Clustering

Service clustering provides an effective means to discover hidden service communities that group services with relevant functionalities. However, the ever increasing number of Web services poses key challenges for building large-scale service communities. In this paper, we address the scalability issue in service clustering, aiming to discover service communities over very large-scale services....

متن کامل

A partition-based algorithm for clustering large-scale software systems

Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...

متن کامل

Efficient Large-Scale Service Clustering via Sparse Functional Representation and Accelerated Optimization

Clustering techniques offer a systematic approach to organize the diverse and fast increasing Web services by assigning relevant services into homogeneous service communities. However, the ever increasing number of Web services poses key challenges for building large-scale service communities. In this paper, we tackle the scalability issue in service clustering, aiming to accurately and efficie...

متن کامل

Efficient Clustering for Large-Scale, Sparse, Discrete Data with Low Fundamental Resolution

Scalable algorithm design has become central in the era of large-scale data analysis. My contribution to this line of research is the design of new algorithms for scalable clustering and data reduction, by exploiting inherent low-dimensional structure in the input data to overcome the challenges of significant amounts of missing entries. I demonstrate that, by focusing on a property of the data...

متن کامل

Fast Subspace Clustering Based on the Kronecker Product

Subspace clustering is a useful technique for many computer vision applications in which the intrinsic dimension of high-dimensional data is often smaller than the ambient dimension. Spectral clustering, as one of the main approaches to subspace clustering, often takes on a sparse representation or a low-rank representation to learn a block diagonal self-representation matrix for subspace gener...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016